Task-based Evaluation of Multiword Expressions: a Pilot Study in Statistical Machine Translation
نویسندگان
چکیده
We conduct a pilot study for task-oriented evaluation of Multiword Expression (MWE) in Statistical Machine Translation (SMT). We propose two different integration strategies for MWE in SMT, which take advantage of different degrees of MWE semantic compositionality and yield complementary improvements in SMT quality on a large-scale translation task.1
منابع مشابه
Mining a Bilingual Lexicon of MultiWord Expressions : A Statistical Machine Translation Evaluation Perspective (Acquisition de lexique bilingue d'expressions polylexicales: Une application à la traduction automatique statistique) [in French]
Mining a Bilingual Lexicon of MultiWord Expressions : A Statistical Machine Translation Evaluation Perspective This paper describes a method aiming to construct a bilingual lexicon of MultiWord Expressions (MWES) from a French-English parallel corpus. We first extract monolingual MWES from each part of the parallel corpus. The second step consists in acquiring bilingual correspondences of MWEs....
متن کاملImproving Statistical Machine Translation Using Domain Bilingual Multiword Expressions
Multiword expressions (MWEs) have been proved useful for many natural language processing tasks. However, how to use them to improve performance of statistical machine translation (SMT) is not well studied. This paper presents a simple yet effective strategy to extract domain bilingual multiword expressions. In addition, we implement three methods to integrate bilingual MWEs to Moses, the state...
متن کاملIntegration of Reduplicated Multiword Expressions and Named Entities in a Phrase Based Statistical Machine Translation System
The language specific Multiword expressions (MWEs) play important roles in many natural language processing (NLP) tasks. Integrating reduplicated multiword expressions (RMWEs) into the Phrase Based Statistical Machine Translation (PBSMT) to improve translation quality is reported in the present work between Manipuri, a highly agglutinative Tibeto-Burman language and English. In addition, Multiw...
متن کاملA System for Compound Noun Multiword Expression Extraction for Hindi
Compound noun multiword expressions are important for many NLP applications like machine translation and information retrieval. This paper describes a system for Hindi compound noun multiword expressions (MWE) extraction from a given corpus. We identify major categories of compound noun MWEs, based on linguistic and psycholinguistic principles. Our extraction methods use various statistical co-...
متن کاملTranslation of Multiword Expressions Using Parallel Suffix Arrays
Accurately translating multiword expressions is important to obtain good performance in machine translation, crosslanguage information retrieval, and other multilingual tasks in human language technology. Existing approaches to inducing translation equivalents of multiword units have focused on agglomerating individual words or on aligning words in a statistical machine translation system. We p...
متن کامل